Enhancing English/Arabic CLIR Using Word Collocations and Statistical Translation and Transliteration Resources
نویسندگان
چکیده
In Cross Language Information Retrieval (CLIR), queries in one language retrieve documents in other language(s). This can be done through Query Translation that comes up against Translation/Transliteration challenges like ambiguity as the main problems. In this paper, a comprehensive solution has been introduced for these challenges. 1, 4 powerful English/Arabic Machine Readable Dictionaries (MRD) from English to Arabic, including a dictionary for collocations and 3 dictionaries for single English words that have been introduced from different perspectives, aiming to examine the effect of each perspective on the final query result. 2, A modern Arabic Corpus has been built. 3, a comprehensive model for Query Translation from English to Arabic that detects and translates collocations, single words translation and transliteration, and solves the replacement ambiguity, has been introduced. The experiments' results proved that the proposed models are very effective overcoming the Query Translation and CLIR challenges.
منابع مشابه
English/Arabic Cross Language Information Retrieval (CLIR) for Arabic OCR-Degraded Text
In this paper, a novel for Query Translation and Expansion for enabling English/Arabic CLIR for both normal and OCR-Degraded Arabic Text model has been proposed, implemented, and tested. First, an English/Arabic Word Collocations Dictionary has been established plus reproducing three English/Arabic Single Words Dictionaries. Second, a modern Arabic Corpus has been built. Third, a model for simu...
متن کاملQuery Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text
This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expect...
متن کاملStatistical Approach to Transliteration from English to Punjabi
-Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Transliteration is a crucial factor in CLIR and MT. It is important for Machine Translation, especially when the languages do not use the same scripts. This paper addresses the issue of statistical mach...
متن کاملJapanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our sy...
متن کاملUsing Transliteration of Proper Names from Arabic to Latin Script to Improve English-Arabic Word Alignment
Bilingual lexicons of proper names play a vital role in machine translation and cross-language information retrieval. Word alignment approaches are generally used to construct bilingual lexicons automatically from parallel corpora. Aligning proper names is a task particularly difficult when the source and target languages of the parallel corpus do not share a same written script. We present in ...
متن کامل